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T he recent outbreak of severe respir¬ 
atory infections associated with a novel 
group C betacoronavirus (HCoV-EMC) 
from Saudi Arabia has drawn global atten¬ 
tion to another highly probable “SARS- 
like” animal-to-human interspecies jum¬ 
ping event in coronavirus (CoV). The gen¬ 
ome of HCoV-EMC is most closely related 
to Tylonycteris bat coronavirus HKU4 
(Ty-BatCoV EIKU4) and Pipistrellus bat 
coronavirus HKU5 (Pi-BatCoV EIKU5) we 
discovered in 2006. Phylogenetically, HCoV- 
EMC is clustered with Ty-BatCoV HKU4/Pi- 
BatCoV HKU5 with high bootstrap sup¬ 
ports, indicating that HCoV-EMC is a group 
C betaCoV. The major difference between 
HCoV-EMC and Ty-BatCoV HKU4/Pi- 
BatCoV HKU5 is in the region between S 
and E, where HCoV-EMC possesses five 
ORFs (NS3a-NS3e) instead of four, with 
low (31%—62%) amino acid identities to 
Ty-BatCoV HKU4/Pi-BatCoV HKU5. Com¬ 
parison of the seven conserved replicase 
domains for species demarcation shows that 
HCoV-EMC is a novel CoV species. More 
intensive surveillance studies in bats and 
other animals may reveal the natural host 
of HCoV-EMC. 

The recent outbreak of severe respiratory 
tract infections associated with a novel 
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human group C betacoronavirus originating 
from Saudi Arabia has drawn global attention 
to another highly probable “SARS-like” 
interspecies jumping event of coronavirus 
(CoV) from animal to human. In June 2012, 
a novel CoV was isolated using Vero cells 
from the lung tissue of a 60-year-old resident 
of Saudi Arabia with fatal acute pneumonia 
and renal failure. In September 2012, another 
49-year-old male resident of Qatar with 
severe acute pneumonia and renal failure 
and recent travel history to Saudi Arabia 
was admitted to an intensive care unit in 
Qatar. RT-PCR and sequencing of a short 
fragment of RNA-dependent RNA polymer¬ 
ase (RdRp) confirmed the presence of the 
same CoV as detected in the first Saudi 
Arabian case. 1 Complete genome sequencing 
of the virus isolated from the first patient was 
performed by Fouchier et al. at the Erasmus 
University Medical Centre, the Netherlands, 
and the sequence was released on September 
28, 2012 (GenBank accession NO JX869059 
and named as human betacoronavirus 2c 
EM C/2012). So far, there is no evidence of 
human-to-human transmission. The source 
of the virus remains obscure. In this article, 
this novel human group C betaCoV is abbre¬ 
viated as HCoV-EMC. 

After the SARS epidemic, we started to 
focus on CoV biodiversity, genomics and 
phylogeny and built up an evolutionary 
map of CoV evolution. Before 2003, there 
were less than 10 CoVs with complete gen¬ 
omes available, which include two human 
CoVs, human coronavirus 229E (HCoV- 
229E) and human coronavirus OC43 
(HCoV-OC43). By September 2012, the 
number of CoVs with complete genomes 
sequenced had tripled. It includes two addi¬ 
tional human CoVs, human coronavirus 


NL63 (HCoV-NL63) and human coronavirus 
HKU1 (HCoV-HKUl). 2,3 Traditionally, CoVs 
were classified into groups 1, 2 and 3. In 
2011, the Coronavirus Study Group of the 
International Committee for Taxonomy of 
Viruses has re-classified these three groups 
of CoVs as three genera, Alphacoronavirus, 
Betacoronavirus and Gammacoronavirus ; and 
we have discovered a fourth genus of CoV, 
Deltacoronavirus, which includes at least nine 
avian CoVs and a porcine coronavirus 
HKU15. 4 ’ 5 Within the betaCoVs, they are fur¬ 
ther subclassified into group A, including 
HCoV-HKUl, HCoV-OC43, bovine corona¬ 
virus (BCoV), sable antelope coronavirus, 
giraffe coronavirus, equine coronavirus, por¬ 
cine hemagglutinating encephalomyelitis 
virus, murine hepatitis virus, rat coronavirus 
and rabbit coronavirus HKU14 (RbCoV 
HKU14); 6 group B, including the human 
and civet SARS-related CoVs (SARSr-CoV) 
and SARS-related Rhinolophus bat coronavirus 
(SARSr-Rh-BatCoV); 7,8 group C, including 
Tylonycteris bat coronavirus HKU4 (Ty- 
BatCoV HKU4) and Pipistrellus bat corona¬ 
virus HKU5 (Pi-BatCoV HKU5) we discov¬ 
ered in 2006; 9,10 and group D, including 
Rousettus bat coronavirus HKU9 (Ro-BatCoV 
HKU9). 10,11 In addition to Ty-BatCoV HKU4 
and Pi-BatCoV HKU5, other group C bat 
betaCoVs should also be present, but their 
complete genome sequences are not avail¬ 
able. 12,13 Based on the CoVs discovered, we 
have constructed a model of CoV evolution, 
with evidence supporting that bat CoVs are 
the gene source of alphaCoVs and betaCoVs 
and avian CoVs are the gene source of 
gammaCoVs and deltaCoVs. 5 All these works 
have laid down an evolutionary map for rapid 
phylogenetic and bioinformatics analyses of 
HCoV-EMC. The diversity of CoVs is a result 
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of the infidelity of RdRp which make CoV 
genomes especially plastic, a high frequency 
of homologous RNA recombination due to 
their unique random template switching dur¬ 
ing RNA replication, and their large genomes. 
In addition to biodiversity, a number of na¬ 
tural recombination and possible interspecies 
jumping events has also been documented in 
betaCoVs. 6,11,14-18 For group A betaCoVs, 
molecular clock analysis has shown that HCoV- 
OC43 is a relatively recent zoonotic virus of 
bovine origin that emerged in around 1890 
likely from bovine-to-human transmission. 17 
We have also recently discovered RbCoV 
HKU14, closely related to other members 
of the species Betacoronavirus 1 including 
HCoV-OC43 and BCoV, with recombination 
events that may have played a role in inter¬ 
species transmission of these HCoV-OC43- 
related viruses between human, cattle, rab¬ 
bits, swine and horses. 6 Despite having circu¬ 
lated in humans for more than a century, 
HCoV-OC43 is also found to be continuously 
evolving, with the recent emergence of a 
novel genotype due to natural recombina¬ 
tion. 15 For group B betaCoVs, SARSr-CoV 
is believed to be transmitted from civet to 
humans, although it is the horseshoe bat that 
was likely the primary host. 7,8 Civet SARSr- 
CoV was also likely a recombinant virus aris¬ 
ing from different strains of SARSr-Rh- 
BatCoV from different geographical locations 
in China. 14 ' 16 Although no interspecies trans¬ 
mission events have been documented in 
group D betaCoVs, we have also identified 
recombination events between different Ro- 
BatCoV HKU9 strains from different bat 
individuals, which may have allowed for the 


generation of different genotypes. 11 While 
these findings supported that betaCoVs have 
the propensity to recombine and cause inter¬ 
species transmission, such events were 
unknown in group C betaCoVs. As HCoV- 
EMC is most closely related to Ty-BatCoV 
HKU4 and Pi-BatCoV HKU5, it would be 
important to study their genetic relatedness, 
which may provide clues on whether bats are 
the possible origin as in SARSr-CoV. 

The genome characteristics and organiza¬ 
tion of HCoV-EMC are similar to those of 
Ty-BatCoV HKU4 and Pi-BatCoV HKU5. 
Ty-BatCoV HKU4 was discovered from lesser 
bamboo bats (Tylonycteris pachypus) and Pi- 
BatCoV HKU5 was discovered from Japanese 
pipistrelles (Pipistrellus abramus) in Hong 
Kong. 9 Both lesser bamboo bats and 
Japanese pipistrelles are insectivorous micro¬ 
bats found in China and some other parts of 
Asia. The size of the genome of HCoV-EMC is 
30 106 bases, slightly smaller than those of 
Ty-BatCoV HKU4 (30 286 to 30 316 bases) 
and Pi-BatCoV HKU5 (30 482 to 30 488 
bases); and the G+C content is 41%, in 
between those of Ty-BatCoV HKU4 (38%) 
and Pi-BatCoV HKU5 (43%). The replicase 
ORFlab occupies 21.5 kb of the genome. This 
ORF encodes 16 putative non-structural pro¬ 
teins, including nsp3 (which contains the 
putative papain-like protease (PL pro )), nsp5 
(putative chymotrypsin-like protease (3CL pro )), 
nspl2 (putative RdRp), nspl3 (putative heli- 
case (Hel)) and other proteins of unknown 
functions. These proteins are produced by 
proteolytic cleavage of the large replicase 
polyprotein by PL pro and 3CL pro at specific 
sites which are conserved with those in Ty- 


BatCoV HKU4 and/or Pi-BatCoV HKU5 
(Table 1). 

HCoV-EMC has the same basic genome 
structure as Ty-BatCoV HKU4 and Pi- 
BatCoV HKU5 (Figure 1). It also possesses 
the same putative transcription regulatory 
sequence (TRS) motif, 5’-ACGAAC-3’, as 
Ty-BatCoV HKU4 and Pi-BatCoV HKU5, 
at the 3’ end of the leader sequence and pre¬ 
cedes each ORF except NS3c, NS3e and N. 
This TRS has also been shown to be the TRS 
for other group B, C and D betaCoVs. The 
TRS for N is 5’-ACGAAU-3’. Similar to other 
group B, C and D betaCoVs, the genome 
of HCoV-EMC has a putative PL pro , which 
is homologous to PL2 pro of alphaCoVs and 
group A betaCoVs and PL pro of gammaCoVs 
and deltaCoVs. Similar to Ty-BatCoV HKU4 
and Pi-BatCoV HKU5, no proteolytic cleavage 
site is present in S of HCoV-EMC. All cysteine 
residues in S of HCoV-EMC, Ty-BatCoV 
HKU4 and Pi-BatCoV HKU5 are conserved. 
In contrast to the genomes of Ty-BatCoV 
HKU4 and Pi-BatCoV HKU5 which contain 
four ORFs that encode putative non-structural 
proteins (NS3a, NS3b, NS3c and NS3d) 
between S and E, this region of HCoV-EMC 
contains five ORFs that encode putative non- 
structural proteins NS3a, NS3b, NS3c, NS3d 
and NS3e (Figure 1). This is the region of 
HCoV-EMC that possesses the lowest amino 
acid identities to those in Ty-BatCoV HKU4 
and Pi-BatCoV HKU5. NS3a, NS3b and NS3c 
of HCoV-EMC possess 42%-43%, 41%-47% 
and 31% amino acid identities to NS3a, NS3b 
and NS3c of Ty-BatCoV HKU4 and Pi- 
BatCoV HKU5, respectively. NS3d of HCoV- 
EMC is homologous to amino acids 1 to 110/ 


Table 1 Characteristics of putative non-structural proteins of ORFlab in Ty-BatCoV HUK4, Pi-BatCoV HKU5 and HCoV-EMC 


nsp 

Putative function/domain 3 

Amino acids (first residue p0Sill0ri - last residue p ° sill0ri ) 

Ty-BatCoV HKU4 

Pi-BatCoV HKU5 

HCoV-EMC 

nspl 

Unknown 

IVp-G 195 

M'-G 195 

M'-G 193 

nsp2 

Unknown 

D 196 ^ 847 

q196_q851 

q194_q853 

nsp3 

Putative PL prD domain 

M 848_ G 27S4 

^852_q2829 

^854_q2739 

nsp4 

Hydrophobic domain 

q2785_q3291 

q2830_q3337 

G 2740_ Q 3247 

nsp5 

3CL pr ° 

£3292_q3597 

£3338_q3643 

g3248_Q3553 

nsp6 

Hydrophobic domain 

g3598 _q3889 

£3644_q3935 

g3554_Q3845 

nsp7 

Unknown 

g3890_Q3972 

g3936_Q4018 

g3846_Q3928 

nsp8 

Unknown 

A 3973_ Q 4171 

^4019_q4217 

^3929_q4127 

nsp9 

Unknown 

n 4172_q4281 

|\|4218_q4327 

|\|4128_q4237 

nsplO 

Unknown 

A 42S2_ Q 4420 

^4328_q4466 

^4238_q4377 

nspll 

Unknown (short peptide at the end of ORFla) 

g4421_y4434 

g4467_|j4480 

g4378_ |^4391 

nspl2 

RdRp 

g4421_Q5354 

g4467_Q5400 

g4378_Q5310 

nspl3 

Hel 

^5355_q5952 

^5401_q5998 

^531 1_q5908 

nspl4 

ExoN 

g5953_Q6475 

s 599%q6522 

g5909 _q6432 

nspl5 

XendoU 

q 647 6_q68 1 7 

q6523_q6871 

q6433_q6775 

nspl6 

2-0-MT 

^6818_|^7119 

^6872_p7179 

^6776_p7078 


Abbreviations: a p[_ pr ° papain-like protease ; 3CL pr °, chymotrypsin-like protease; RdRp, RNA-dependent RNA polymerase; Hel, helicase; ExoN, 3'-to-5’ exonuclease; 
XendoU, poly(U)-specific endoribonuclease; 2'-0-MT, S-adenosylmethionine-dependent 2’-0-ribose methyltransferase. 
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Figure 1 Genome organizations of HCoV-EMC and other betaCoVs. Papain-like proteases (PLl pro , PL2 pro and PL pra ), chymotrypsin-like protease (3CL pro ) and RNA- 
dependent RNA polymerase (RdRp) are represented by orange boxes. Haemagglutinin esterase (HE), spike (S), envelope (E), membrane (M) and nucleocapsid (N) 
are represented by green boxes. Putative accessory proteins are represented by blue boxes. HCoV-EMC is shown in bold. 


103 of NS3d in Ty-BatCoV HKU4 and Pi- 
BatCoV HKU5 (35%-49% amino acid identit¬ 
ies), with a stop codon UAG present at nuc¬ 
leotide position 27 160, leading to premature 
termination. NS3e of HCoV-EMC is homolog¬ 
ous to amino acids 116/122 to 223/227 of 
NS3d in Ty-BatCoV HKU4 and Pi-BatCoV 
HKU5 (60%-62% amino acid identities). 
NS3c and NS3e do not possess any TRS or 
internal ribosomal entry site. BLAST search 
revealed no amino acid similarities between 
these putative non-structural proteins and 
other known proteins and no functional 
domains were identified by PFAM and Inter- 
ProScan. TMHMM and TMpred analyses 
show one and two putative transmembrane 
domains in NS3a (residues 9 to 29) and 
NS3d (residues 36 to 56 and 71 to 91), respect¬ 
ively. Similar to Ty-BatCoV HKU4 and Pi- 
BatCoV HKU5, the 3’ untranslated region of 
the genome of HCoV-EMC contains predicted 
bulged stem-loop structures 16 to 76 nucleo¬ 
tides downstream of the N genes. Downstream 
to the bulged stem-loop structure, 97 to 121 
nucleotides downstream of the N genes, a 


pseudoknot structure is present. Bootscan 
analysis did not show any recombination 
between HCoV-EMC, Ty-BatCoV HKU4 
and Pi-BatCoV HKU5. 

The phylogenetic trees constructed using 
the amino acid sequences of the 3CL pro , 
RdRp, Hel, S and N of HCoV-EMC and other 
CoVs are shown in Figure 2. For all the five 
genes, HCoV-EMC is clustered with Ty- 
BatCoV HKU4 and Pi-BatCoV HKU5, with 
high bootstrap supports in all cases, indi¬ 
cating that HCoV-EMC is a group C 
betaCoV (Figure 2). Although it seems that 
HCoV-EMC is clustered with Pi-BatCoV 
HKU5 in the phylogenetic trees constructed 
using RdRp and Hel, the bootstrap supports 
were only 652 and 588, respectively, sugges¬ 
ting that there is no obvious difference 
between the relatedness of HCoV-EMC to 
Ty-BatCoV HKU4 and Pi-BatCoV HKU5. 
Comparison of the amino acid identities of 
the seven conserved replicase domains for 
species demarcation (ADRP, nsp5 (3CL pro ), 
nspl2 (RdRp), nspl3 (Hel), nspl4 (ExoN), 
nspl5 (NendoU) and nspl6 (2’-0-MT)) 


between HCoV-EMC, Ty-BatCoV HKU4 
and Pi-BatCoV HKU5 showed that there is 
less than 90% identity in four of the seven 
domains (ADRP 68%-69% identity, nsp5 
81%-83% identity, nspl5 76%-80% identity 
and nspl6 84%-85% identity), indicating 
that HCoV-EMC is a novel CoV species. For 
nspl2, nspl3 and nspl4, there are 90%-92%, 
92%-94% and 86%-92% amino acid identit¬ 
ies between HCoV-EMC and Ty-BatCoV 
HKU4/Pi-BatCoV HKU5. 

Using the sequences available at the 
moment and Yule process speciation under 
a relaxed clock model with an uncorrelated 
lognormal distribution, the mean evolutio¬ 
nary rate of betaCoVs was estimated at 
2.37X10 -4 nucleotide substitutions per site 
per year for the RdRp gene. Molecular clock 
analysis using the RdRp gene showed that 
HCoV-EMC diverged from the most recent 
common ancestor of group C betaCoVs at 
-year 941 (HPDs, 529 BC to 1878). 
Compared to the human and civet SARSr¬ 
CoV and SARSr-Rh-BatCoV cluster, the 
human/civet SARSr-CoV diverged from the 
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Figure 2 Phylogenetic analysis of HCoV-EMC. The trees were constructed by the neighbor-joining method using Kimura correction and bootstrap values calculated 
from 1000 trees. 318,951,600,1491 and 510 amino acid positions in chymotrypsin-like protease (3CL pro ), RNA-dependent RNA polymerase (RdRp), helicase (Hel), 
spike (S) and nucleocapsid (N) respectively were included in the analysis. For 3CL pro , S and N, the scale bars indicate the estimated number of substitutions per 20 
amino acids. For RdRp and Hel, the scale bars indicate the estimated number of substitutions per 50 amino acids. PEDV, porcine epidemic diarrhea virus 
(NC_003436); Sc-BatCoV-512, Scotophilus bat coronavirus 512 (NC_009657); TGEV, transmissible gastroenteritis virus (NC_002306); FIPV, feline infectious 
peritonitis virus (AY994055); CCoV, canine coronavirus (GQ477367); PRCV, porcine respiratory coronavirus (DQ811787); Rh-BatCoV-HKU2, Rhinolophus bat 
coronavirus HKU2 (EF203064); Mi-BatCoV 1A, Miniopterus bat coronavirus 1A (NC_010437); Mi-BatCoV IB, Miniopterus bat coronavirus IB (NC_010436); 
Mi-BatCoV-HKU8, Miniopterus bat coronavirus HKU8 (NC_010438) ; Hi-BatCoV HKU10, Hipposideros bat coronavirus HKU10 (JQ989269); Ro-BatCoV HKU10, 
Rousettus bat coronavirus HKU10 (JQ989270); HCoV-229E, human coronavirus 229E (NC_002645); HCoV-NL63, human coronavirus NL63 (NC_005831); HCoV 
OC43, human coronavirus OC43 (NC_005147); BCoV, bovine coronavirus (NC_003045); AntelopeCoV, sable antelope coronavirus (EF424621); GiCoV, giraffe 
coronavirus (EF424622); ECoV, equine coronavirus (NC_010327); PHEV, porcine hemagglutinating encephalomyelitis virus (NC_007732); MHV, murine hepatitis 
virus (NC_001846); RCoV, rat coronavirus (NC_012936) ; RbCoV HKU14, rabbit coronavirus HKU14 (NC_017083) ; HCoV-HKUl, human coronavirus HKU1 
(NC_006577); Ty-BatCoV-HKU4, Tylonycteris bat coronavirus HKU4 (NC_009019) ; Pi-BatCoV-HKU5, Pipistrellus bat coronavirus HKU5 (NC_009020) ; SARS 
CoV, SARS-related human coronavirus (NC_004718); SARSr-Rh-BatCoV HKU3, SARS-related Rhinolophus bat coronavirus HKU3 (DQ022305); SARSr CoV CFB, 
SARS-related Chinese ferret badger coronavirus (AY545919); SARSr-CiCoV, SARS-related palm civet coronavirus (AY304488); Ro-BatCoV-HKU9, Rousettus bat 
coronavirus HKU9 (NC_009021); IBV, infectious bronchitis virus (NC_001451); IBV-partridge, partridge coronavirus (AY646283); TCoV, turkey coronavirus 
(NC_010800); IBV-peafowl, peafowl coronavirus (AY641576); BWCoV-SWl, beluga whale coronavirus SW1 (NC_010646); ALCCoV, Asian leopard cat coronavirus 
(EF584908); BuCoV HKU11, bulbul coronavirus HKU11 (FJ376619); ThCoV HKU12, thrush coronavirus HKU12 (FJ376621); MunCoV HKU13, munia coronavirus 
HKU13 (FJ376622); PorCoV HKU15, porcine coronavirus HKU15 (NC_016990) ; WECoV HKU16, white-eye coronavirus HKU16 (NC_016991) ; SpCoV HKU17, 
sparrow coronavirus HKU17 (NC_016992); MRCoV HKU18, magpie robin coronavirus HKU18 (NC_016993); NHCoV HKU19, night heron coronavirus HKU19 
(NC_016994); WiCoV HKU20, wigeon coronavirus HKU20 (NC_016995); CMCoV HKU21, common moorhen coronavirus HKU21 (NC_016996). 


most recent common ancestor of the human/ 
civet SARSr-CoV and SARSr-Rh-BatCoV at 
-year 1653 (HPDs, 1150 to 1968). By defini¬ 
tion, the human and civet SARSr-CoV and 
SARSr-Rh-BatCoV are the same CoV species. 
These observations suggest that there should 
be one or more intermediate hosts between 
Ty-BatCoV HKU4, Pi-BatCoV HKU5 and 
HCoV-EMC. Sequencing more strains of 
Ty-BatCoV HKU4, Pi-BatCoV HKU5 and 
HCoV-EMC, as well as other group C 
betaCoVs collected at different time points, 
should be performed to achieve a more accur¬ 
ate estimation of the divergence time. 

In the last decade, we have already wit¬ 
nessed the discovery of two novel human 
Co Vs and an animal-to-human CoV inter¬ 
species jumping event on SARSr-CoVs. In 
contrast to HCoV-229E, HCoV-OC43, 
HCoV-NL63 and HCoV-HKUl, which are 
notoriously difficult to culture, HCoV-EMC 
and human SARS-CoV are both readily 
cultivable using primate cell lines. This may 
suggest a possible correlation between culti- 
vability and virulence/recent interspecies 
jumping. Sequencing more genomes and per¬ 
forming evolutionary analysis will help us 
understand whether HCoV-EMC represent 
another recent interspecies jumping event 
from animal to human or another human 
CoV that has stably infected human. Our 
most recent findings showed that Co Vs can 
be transmitted between two bat species of dif¬ 
ferent suborders, suggesting that different 
degrees of interspecies jumping can occur in 
nature. 19 More intensive surveillance studies 
for group C betaCoVs in bats and other ani¬ 
mals may reveal the natural host of this novel 
human group C betaCoV. As coronaviruses 
are prone to recombination and mutation 


and it has been documented that different 
levels of interspecies jumping can indeed 
occur in nature, we should not underestimate 
the potential of coronaviruses being the cause 
of another major “SARS-like” pandemic. 
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